Splice site prediction using stochastic regular grammars.
نویسندگان
چکیده
This paper presents a novel approach to the problem of splice site prediction, by applying stochastic grammar inference. We used four grammar inference algorithms to infer 1465 grammars, and used 10-fold cross-validation to select the best grammar for each algorithm. The corresponding grammars were embedded into a classifier and used to run splice site prediction and compare the results with those of NNSPLICE, the predictor used by the Genie gene finder. We indicate possible paths to improve this performance by using Sakakibara's windowing technique to find probability thresholds that will lower false-positive predictions.
منابع مشابه
PreRkTAG: Prediction of RNA Knotted Structures Using Tree Adjoining Grammars
Background: RNA molecules play many important regulatory, catalytic and structural <span style="font-variant: normal; font-style: norma...
متن کاملStochastic modeling of RNA pseudoknotted structures: a grammatical approach
MOTIVATION Modeling RNA pseudoknotted structures remains challenging. Methods have previously been developed to model RNA stem-loops successfully using stochastic context-free grammars (SCFG) adapted from computational linguistics; however, the additional complexity of pseudoknots has made modeling them more difficult. Formally a context-sensitive grammar is required, which would impose a large...
متن کاملFeature subset selection for splice site prediction
MOTIVATION The large amount of available annotated Arabidopsis thaliana sequences allows the induction of splice site prediction models with supervised learning algorithms (see Haussler (1998) for a review and references). These algorithms need information sources or features from which the models can be computed. For splice site prediction, the features we consider in this study are the presen...
متن کاملAccurate Computation of the Relative Entropy Between Stochastic Regular Grammars
Works dealing with grammatical inference of stochastic grammars often evaluate the relative entropy between the model and the true grammar by means of large test sets generated with the true distribution. In this paper, an iterative procedure to compute the relative entropy between two stochastic deterministic regular grammars is proposed. Resumé Les travails sur l’inférence de grammaires stoch...
متن کاملStochastic Context-Free Grammars and RNA Secondary Structure Prediction
This thesis focus on the prediction of RNA secondary structure using stochastic context-free grammars (SCFG). The RNA secondary structure prediction problem consists of predicting a 2-dimensional structure from a 1-dimensional nucleotide sequence. The theory behind SCFG is explained and an overview of the research literature on various methods in the field of secondary structure prediction is g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genetics and molecular research : GMR
دوره 6 1 شماره
صفحات -
تاریخ انتشار 2007